Measuring Text Readability by Lexical Relations Retrieved from Wordnet

نویسندگان

  • Shu-Yen Lin
  • Cheng-Chao Su
  • Yu-Da Lai
  • Li-Chin Yang
  • Shu-Kai Hsieh
چکیده

Current readability formulae have often been criticized for being unstable or not valid. They are mostly computed in regression analysis based on intuitively-chosen variables and graded readings. This study explores the relation between text readability and the conceptual categories proposed in Prototype Theory. These categories form a hierarchy: Basic level words like guitar represent the objects humans interact with most readily. They are acquired by children earlier than their superordinate words (or hypernyms) like stringed instrument and their subordinate words (or hyponyms) like acoustic guitar. Therefore, the readability of a text is presumably associated with the ratio of basic level words it contains. WordNet, a network of meaningfully related words, provides the best online open source database for studying such lexical relations. Our preliminary studies show that a basic level word can be identified by its frequency to form compounds (e.g. chair armchair) and the length difference from its hyponyms in average. We compared selected high school English textbook readings in terms of their basic level word ratios and their values calculated in several readability formulae. Basic level word ratios turned out to be the only one positively correlated with the text levels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing Text Readability Using Hierarchical Lexical Relations Retrieved from WordNet

Although some traditional readability formulas have shown high predictive validity in the r = 0.8 range and above (Chall & Dale, 1995), they are generally not based on genuine linguistic processing factors, but on statistical correlations (Crossley et al., 2008). Improvement of readability assessment should focus on finding variables that truly represent the comprehensibility of text as well as...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Creation of Lexical Relations for IndoWordNet

WordNet is an electronic lexical database available on-line as a powerful resource to the researchers in the area of computational linguistics, text processing and other related areas. WordNet for Hindi language has already been developed by IIT, Bombay. The Indian languages WordNets are being created using expansion approach from Hindi WordNet under IndoWordNet project. In expansion approach, ...

متن کامل

Building a network of topical relations from a corpus

Lexical networks such as WordNet are known to have a lack of topical relations although these relations are very useful for tasks such as text summarization or information extraction. In this article, we present a method for automatically building from a large corpus a lexical network whose relations are preferably topical ones. As it does not rely on resources such as dictionaries, this method...

متن کامل

Non-Classical Lexical Semantic Relations

NLP methods and applications need to take account not only of “classical” lexical relations, as found in WordNet, but the lessstructural, more context-dependent “nonclassical” relations that readers intuit in text. In a reader-based study of lexical relations in text, most were found to be of the latter type. The relationships themselves are analyzed, and consequences for NLP are discussed.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008